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Abstract 

3 Recovering a sparse signal from an undersampled set of random linear measurements is the main problem 

of interest in compressed sensing. In this paper, we consider the case where both the signal and the 
measurements are complex. We study the popular reconstruction method of ^-regularized least squares 
or LASSO. While several studies have shown that the LASSO algorithm offers desirable solutions under 
certain conditions, the precise asymptotic performance of this algorithm in the complex setting is not 
yet known. In this paper, we extend the approximate message passing (AMP) algorithm to the complex 
signals and measurements and obtain the complex approximate message passing algorithm (CAMP). 
q We then generalize the state evolution framework recently introduced for the analysis of AMP, to the 

'— 1 complex setting. Using the state evolution, we derive accurate formulas for the phase transition and 

noise sensitivity of both LASSO and CAMP. 

> 

1 Introduction 

Recovering a sparse signal from an undersampled set of random linear measurements is a problem of interest 
in compressed sensing (CS). In the past few years many algorithms have been proposed for signal recovery 
and their performances have been analyzed both analytically and empirically [l]-[6]. However, whereas 
most of the theoretical work has focussed on the case of real-valued signals and measurements, in many 
applications, such as magnetic resonance imaging and radar, the signals are more easily representable in 
the complex domain. For such applications, often the real and imaginary components of a complex signal 
tend to be either zero or non-zero simultaneously. Therefore, recovery algorithms may benefit from this 
prior knowledge. Indeed the results presented in this paper confirm this intuition. 

Motivated by these observations, we investigate the performance of the complex valued LASSO in 
the case of noise-free and noisy measurements. The derivations are based on the state evolution (SE) 
framework, presented previously in |3|. Also a new algorithm, complex approximate message passing 
(CAMP), is presented to solve the complex LASSO. This algorithm is an extension of the AMP algorithm 
(3j[7j. However, the extension of AMP and its analysis from the real to the complex setting is not trivial, 
due to the different properties of the amplitude and phase components compared to the real valued case. 

In the follow up of this section, we briefly review some of the existing algorithms for sparse signal 
recovery in the real- valued setting and then focus on recovery algorithms for the complex case, with 
particular attention to the AMP and CAMP algorithms. We then introduce two criteria which are used 
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as measures of performance for the noiseless and noisy measurements respectively. Based on these criteria 
we establish the novelty of our results compared to the existing work. An overview of the organization of 
the paper is provided in Section 1.7 



1.1 Real- valued sparse recovery algorithms 

Consider the problem of recovering a sparse vector s a £ H N from a noisy undersampled set of linear 
measurements y 6 R n , where y = As Q + w and w is the noise. Let k denote the number of nonzero 
elements of s Q . The measurement matrix has iid elements from a given distribution on R. Given y 
and A, we seek an approximation to s Q . Many recovery algorithms have been proposed, ranging from 
the convex relaxation techniques to greedy approaches to iterative thresholding schemes. See [T| and the 
references therein for an exhaustive list of algorithms. [6] has compared several different recovery algorithms 
and concluded that among the algorithms compared in that paper the t\ -regularized least squares , a.k.a. 
LASSO or BPDN, j2|8j that seeks the minimizer of minx — ^4x||| + A||x||i provides the best performance 
in the sense of sparsity measurement tradeoff. Recently, several iterative thresholding algorithms have been 
proposed for solving LASSO using few computations per iteration that enable the usage of LASSO for high 
dimensional problems. See [9] and the references therein for an exhaustive list of the algorithms. In this 
paper, we are particularly interested in AMP [3j. Starting from x° = and z° = y, AMP uses the following 
iterations: 

?? (x* + A T z t ;T t ), 
\jt\ 

y - Ax 1 + —z 1 ' 1 , 
n 

where r/ (x;r) = (|x| — r) + sign(x) is the soft thresholding function, Tt is the threshold parameter and P 
is the active set of x*, i.e., / = {i \ x\ ^ 0}. Furthermore, the theoretical prediction of its asymptotic 
behavior, is also accurate for LASSO [7], [To] . 



x t+1 = 
z* = 



1.2 Complex- valued sparse recovery algorithms 

Consider the complex setting where both the signal s and the measurements y are in the complex domain. 
The success of LASSO has motivated researchers to use similar techniques here as well. We consider the 
following two schemes that have been used in the signal processing literature: 

• r-LASSO — The simplest extension of the LASSO to the complex setting is to consider the complex 
signal and measurements as a 2N dimensional real signal and 2n dimensional real measurements, 
respectively. Let the superscript R and I denote the real and imaginary parts of a complex number. 
Define y = [(y R ) T , (y I ) T ] T and s = [(s^) T , (s I ) T ] T , where the superscript T denotes the transpose 
operator. We have 

_ _ / A R -A 1 

y-l A i a r 



We then search for an approximation of s Q by running argmin^ ^\\y — Ax\\2 + A||x| 
this algorithm r-LASSO. The limit of the solution as A — > is 
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We call 



arg min ||x||i, s.t. y = Ax, 

X 
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which is called the basis pursuit problem or r-BP in this paper. It is straightforward to extend the 
analyses of LASSO and BP for the real signals to r-LASSO and r-BPj^] However, this approach ignores 
the information about the grouping of the real and imaginary parts. In fact, in many applications 
the real and imaginary components tend to be small or large simultanously. 

c-LASSO — The more natural extension of the LASSO to the complex setting is the following opti- 
mization problem that we term c-LASSO 



min — \\y — Ax\fa + A||x||i, 
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The limit 



1 

r 

where the complex £i-norm is defined as ||x||i = ^ |a?j| = Y^i \J i x f) 2 + i x i ) 2 
of the solution as A — > is 

arg min ||x||i, s.t. y = Ax, 

X 

which we refer to as c-BP. 

An important questions we address in this paper is: can we measure how much the grouping of the real 
and the imaginary parts improves the performance of c-LASSO? Several papers have considered similar 



problems 18-35] and have provided guarantees on the performance of this algorithm. However, the re- 
sults are usually inconclusive because of the loose constants involved in their analysis. This paper aims 
to address the above questions with an analysis that does not involve any loose constants and therefore 
provides accurate comparisons. 

Motivated by the recent results in the asymptotic analysis of the LASSO |3j, [7j, we first derive the 
complex approximate message passing algorithm (CAMP) as a fast and efficient algorithm for solving 
the c-LASSO problem. We then extend the state evolution (SE) framework introduced in |3J to predict 
the performance of the CAMP algorithm in the asymptotic setting. Since the CAMP algorithm solves 
c-LASSO, such predictions are accurate for c-LASSO as well for N —> oo. The analysis carried out in 
this paper provides new information and insight on the performance of the c-LASSO that was not known 
before such as the least favorable distribution and the noise sensitivity of c-LASSO and CAMP. A more 



detailed description of the contributions of this paper is summarized in Section 1.5 



1.3 Notation 



Let \a\, La and a* denote the amplitude, phase, and conjugate of a £ C respectively. Furthermore, for the 
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matrix A £ (^ ny - N 5 A*, Ai, Aij denote the conjugate transpose, i th column and ij element of matrix A. 



We are interested in approximating a sparse signal s a G C^ from an undersampled set of noisy linear mea- 
surements y = As + w. A £ <£ nxN has iid random elements (with independent real and imaginary parts) 
from a given distribution that satisfies ~EAij = and S|^4jj| 2 = i, and w G C^ is the measurement noise. 
Throughout the paper, we assume that the noise is iid CN(0, a 2 ), where CN stands for the complex normal. 



We are interested in the asymptotic setting where 5 = n/N and p = k/n are fixed, while iV — > oo. We 
further assume that the elements of s are iid s 0j j ~ (1 — p6)6o(s 0t i) + p5G(s ^), where G is an unknown 
probability distribution without any point mass at oj^] Clearly, the expected number of non-zero elements 



1 The theoretical results on LASSO and BP consider iid Gaussian measurement matrices 13 
that the results are universal and hold for a "larger" class of random matrices |7||14|. 



However, it is conjectured 



This assumption is not necessary and as long as the marginal distribution of s converges to a given distribution the 



statements of this paper will hold. For further information on this refer to |7] and 



10 



3 



in the vector s is p5N. In this model we are assuming that all the non-zero real and imaginary coefficients 
are paired. This quantifies the maximum amount of improvement the c-LASSO gains by grouping the real 
and imaginary parts. 

Define F £i7 as the family of distributions F with F(0 + ) - F(0~) > 1 - e and E F (X 2 ) < ej 2 . Also, 
define F £ = { F \ F(0+) - F(0~) < e}. 



1.4 Performance criteria 

We compare the c-LASSO with r-LASSO in both the noise-free and noisy measurements cases. For each 
scenario, we define a specific measure to compare the performance of the two algorithms. 



1.4.1 Noise- free measurements 

Let A a be a sparse recovery algorithm with free parameter a. Given (y,A), A a returns an estimate x Aa 
of s Q . Suppose that in the noise free case, as N — > oo, the performance of A a exhibits a sharp phase 
transition, i.e., for every value of 5, there exists p Aa {5), below which limjv^oo \\x- A - a — s Q || 2 /A — > almost 
surely, while for p > p Aa (5), A a fails with probability 1. The phase transition has been studied either 
empirically or theoretically for many sparse recovery algorithms [oj |13[|l4||36f[39| . The phase transition 
curve pA a {5) specifies the fundamental exact recovery limit of algorithm A a - 

The free parameter a often affects the performance of the sparse recovery algorithm [6]. Therefore, optimal 
tuning of this parameter is essential in practical applications. One approach is to tune the parameter for 
the highest phase transition |6jJ^]i.e., 

p A {5) = supp Aa (5). 

a 

In other words, p A is the best performance A a provides in the exact sparse signal recovery problem, if we 
know how to tune the algorithm properly. Based on this framework, we say algorithm A outperforms B 
at a given 5, if and only if p A (5) > p B (S). 



1.4.2 Noisy measurements 

In the presence of measurement noise exact recovery is not possible. Therefore, tuning the parameter for 
the highest phase transition curve does not necessarily provide the optimal performance. In this section, 
we explain the optimal noise sensitivity tuning introduced in |7|. Consider the ^-norm as a measure for 

II '-4a l|2 — 

the reconstruction error and assume that " N °" 2 -> MSE(p,<5, a, a) almost surely. Define the noise 
sensitivity of the algorithm as 

WQ/ r s MSE(p,5,a,a) 

JNh(p, o, a) = sup sup 7, , 

<j>0 G <7 

where a denotes the tuning parameter of algorithm A a . If the noise sensitivity is large, then the measure- 
ment noise may severely degrade the final reconstruction. Therefore, we tune the parameter a to obtain 
the lowest noise sensitivity, i.e., 

NS(p,«5) =m£NS(p,S,a). 

a 

Based on this framework, we say algorithm A outperforms B at a given 5 and p, if and only if NS A (5, p) < 
NS B (5,p). 



3 In this paper, we are considering algorithms whose phase transitions do not depend on the distribution G of non-zero 
coefficients. Otherwise, one could use the maximin framework introduced in |6|. 
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Figure 1: Comparison of the phase transition curve of the r-BP and c-BP. When all the non-zero real and 
imaginary parts of the signal are grouped, the phase transition of c-BP outperforms that of r-BP. 



1.5 Contributions 

In this paper, we first develop the complex approximate message passing (CAMP) algorithm. This al- 
gorithm provides a simple and fast converging iterative method for solving c-LASSO. We extend the 
state evolution, introduced recently as a framework for accurate asymptotic predictions of the AMP per- 
formance, to the CAMP. We will then use the connection between CAMP and c-LASSO to provide an 
accurate asymptotic analysis of the c-LASSO algorithm. We aim to characterize the phase transition curve 
(noise free measurements) and noise sensitivity (noisy measurements) of these two algorithm when the real 
and imaginary parts are all paired, i.e., they are both zero or non-zero. Both criteria have been exten- 
sively studied for the real signals (and hence for the r-LASSO) (3|[7] . The results of our predictions are 
summarized in Figures [TJ [2j and [3j Figure [T] compares the phase transition curve of c-BP and CAMP with 
the phase transition curve of r-BP. As we expected c-BP and CAMP outperform r-BP since they exploit 
the connection between the real and imaginary parts. If pse(5) denotes the phase transition curve, we 
also prove that pse($) ~ i og (i/25) a s <5 — ^ 0. Comparing this with p§ E {d~) ~ 2iog(i/<5) ^ or r-LASSO, we 
conclude that 

hm = 2. 

<5^o p§ E (6) 



Figure [2] exhibits the noise sensitivity of c-LASSO and CAMP. We prove in Section 3.3 that, as the sparsity 



approaches the phase transition curve, the noise sensitivity grows up to infinity. Finally, Figure [3] compares 
the contour plots of the noise sensitivity of c-LASSO with those of the r-LASSO. For the fixed value noise 
sensitivity, the level set of the c-LASSO is higher than that of r-LASSO. It is worth noting that the same 



comparisons are true between CAMP and AMP, as will be clarified in Section 3.4 



1.6 Related work 

The state evolution framework used in this paper was first introduced in [3] . Deriving the phase transition 
and noise sensitivity of the LASSO for real signals and real measurements from SE is due to [7j; see |40| 
for more comprehensive discussion. Finally, the derivation of the AMP from the full sum-product message 
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Figure 2: Contour lines of noise sensitivity in the (5, p) plane. The black curve is the phase transition 
curve at which the noise sensitivity is infinite. The colored lines display the level sets of NS(p, <5) = 
0.125,0.25,0.5,1,2,4,8. 




Figure 3: Comparison of the noise sensitivity of r-LASSO with the noise sensitivity of c-LASSO. The 
colored solid lines present the level sets of the NS(p,5) = 0.125,0.5,2 for the c-LASSO and the colored 
dotted lines display the same level sets for the r-LASSO. 
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passing is due to |41| . Our main contribution in this paper is to extend these results to the complex 
setting. Not only is the analysis of the state evolution more challenging in this setting, but it also provides 
new insights on the performance of c-LASSO that has not been available before. For instance, the noise 
sensitivity of c-LASSO is not yet known. 



The recovery of sparse complex signals is a special case of group-sparsity or block-sparsity. According 
to the group sparsity assumption, the non-zero elements of the signal tend to occur in groups or clusters. 
One of the algorithms used in this context is the group-LASSO 29 ,31 . Consider a signal s a G H N . 
Partition the indices of s into m groups g\, . . . , g m . The group-LASSO algorithm minimizes the following 
cost function: 



1„ 
mm- y 

x I 



Ax\\l 



X 



9i 1 1 2 > 



(1) 



where the Aj's are regularization parameters. This algorithm has been studied in the literature exten- 
sively |18) - [35| . We briefly review several papers and emphasize on the differences with our work. [32] 
analyzes the consistency of the group LASSO estimator in the presence of noise. Fixing the signal s a , it 
provides conditions under which the group LASSO is consistent as n — > oo. (33,42 consider a weak notion 



of consistency that is the exact support recovery. However, [42] proves that in the setting we are interested 
in, i.e., k/n = p and n/N = 5, even the exact support recovery is not possible. Clearly, when the noise 
is present our goal is neither the exact recovery nor the exact support recovery. Instead, we characterize 
the mean square error of the reconstruction. This criterion has been considered in [18 34 . Although the 



results of 18,34 show qualitatively the benefit of group sparsity, they do not characterize the difference 



quantitatively. In fact, loose constants in both the error bound and the number of samples do not permit 
accurate comparison of the performances. In our analysis, no loose constant is involved, and we provide 
very accurate characterization of the mean square error. 



Group-sparsity and group-LASSO are also of interest in the sparse recovery community. For example, 
the analysis carried out in 20,23, 24 are based on "coherence" . These results provide sufficient conditions 



with again loose constants as discussed before. The work of 25 - 27 addresses this issue by an accurate 
analysis of the algorithm in the noiseless setting a = 0. They provide a very accurate estimate of the 
phase transition curve for the group-LASSO. However, SE provides a more flexible framework to analyze 
c-LASSO than the analysis of [27| , and it provides more information than just the phase transition curve. 
For instance, it shows the least favorable distribution of the input and noise sensitivity of c-LASSO. 



While writing this paper we were made aware that in an independent work Donoho and Montanari are 
extending the state evolution framework to the general setting of group sparsity [43] . Their work considers 
the state evolution framework for the group-LASSO problem and will include the generalization of the 
analysis provided in this paper to the case where the variables tend to cluster in groups of size m. 



Also, both the complex signals and group-sparse signals are special cases of model based compressed 
sensing (CS) [44]. By introducing more structured models for the signal, 44 proves that the number of 
measurements needed are proportional to the "complexity" of the model rather than the sparsity level. 
The results in model-based CS also suffer from loose constants in both the number of measurements and 
the mean square error bounds. 



Finally, from the algorithmic point of view, several papers have considered solving the c-LASSO prob- 
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lem using first-order methods [4 15 The deterministic framework that measures the convergence of an 
algorithm on the most difficult problem instance, that gives the slowest convergence rate, is not an appro- 
priate measure of the convergence rate for the compressed sensing problems Therefore, [9] considers 
the average convergence rate for iterative algorithm. In that setting, AMP is the only first order algorithm 
that provably achieves linear convergence to date. Similarly, the CAMP algorithm, introduced in this 
paper, provides the first, first-order c-LASSO solver that provides a linear average convergence rate. 



1.7 Organization of the paper 

We introduce the CAMP algorithm in Section [2] We then explain the state evolution equations to charac- 
terize the evolution of the mean square error through the iterations of the CAMP algorithm in Section [3] 
and we analyze the important properties of the state evolution equations. We then discuss the connection 



between our calculations and the solution of LASSO in Section 3.4 We confirm our results thru running 
Monte Carlo simulations in Section [4j 

2 Complex Approximate Message Passing 

The high computational complexity of interior point methods for solving large convex optimization prob- 
lems spurred the development of first-order methods for solving the LASSO problem. See [9] and the 
references therein for a description of some of these algorithms. One of the most successful algorithms 
for the compressed sensing problems is the AMP algorithm introduced in [3|. In this section, we use the 
approach introduced in [IT] to derive the approximate message passing algorithm for the c-LASSO problem 
that we term Complex Approximate Message Passing (CAMP). 

Let si, S2, ■ ■ ■ , sjv be N random variables with the following distribution: 

p( Sl , S2 ,..,^) = ^e-WU-|ll^llI, (2) 

where j3 is a constant and Z{f3) = $ s e-P x \\ s h-i\\v-M\ld s . As (3 oo the mass of this distribution 
concentrates around the solution of the LASSO. Therefore, one way to find the solution of LASSO is to 
marginalize this distribution. However, calculating the marginal distribution is an NP-complete problem. 
The sum-product message passing algorithm provides a successful heuristic for approximating the marginal 
distribution. As N — > oo and f3 — > oo the iterations of the sum-product message passing algorithm are 
simplified to [4i~] 



V{Yj A bi Z b^i> T t 



z a-^ti = Ua ~ ^] A a jXj_^ a , (3) 

where rj(u + iv; A) = (^u + iv — ) ^{u 2 +v 2 >\ 2 } is the proximity operator of the complex £i-norm 

and is called complex soft thresholding. See Appendix [A] for further information regarding this function. 



Tf is the threshold parameter at time t. The choice of this parameter will be discussed in Section 3.1 The 



4 First-order methods are iterative algorithms that use either the gradient or the subgradient of the function at the previous 
iterations to update their estimates. 
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per iteration computational complexity of this algorithm is high since 2nN messages x\^ a and z^-m are 
updated. Therefore, following [41] we assume that 



x 



z. 



t 

i—>a 
t 



xl + 5xU a + 0(l/N), 

-4 + SzU a + 0(l/N), (4) 



where 5x\_ ¥a ,5z\^ a = 0(^=). Here, O(-) errors are uniform in the choice of the edges i — > a and a 



i. 



For further discussion of this assumption and its validation see 41 . Let rj 1 and rj R be the imaginary 
and real parts of the complex soft thresholding function. Furthermore, define and as the partial 

derivatives of n R with respect to the real and imaginary parts of the input respectively. and are 
defined similarly. We then have 



Proposition 2.1. Suppose that Q holds for every iteration of the message passing algorithm specified 
([3]) . Then x\ and z l a satisfy the following equations: 

b 

Za = Aajx) - ^ A aj I (x* 1 + ^ A *b,j z b ^ ) ^iKi/j^a) 



in 




(5) 



where 1Z and X operators return the real and imaginary parts of a complex number respectively. 



See Appendix [B| for the proof. According to Proposition 2.1 and Q, for large values of N, the messages 
x\^ a and are close to x\ and z l a in ([5]). Therefore, we define the CAMP algorithm as the iterative 
method that starts from x° = and z° = y and uses the iterations specified in ([5]). It is important to note 
that Proposition 2.1 does not provide any information on either the performance of the CAMP algorithm 
or the connection between CAMP and c-LASSO, since the message passing is a heuristic algorithm and 
does not necessarily converge to the correct marginal distribution of Q. 



3 Formal analysis of CAMP and c-LASSO 

In this section, we explain the state evolution framework as a framework that predicts the performance of 
the CAMP and c-LASSO in the asymptotic settings. We then use this framework to analyze the phase 
transition and noise sensitivity of the c-LASSO and CAMP. 



3.1 State evolution 



We now conduct an asymptotic analysis of the CAMP algorithm. As we confirm in Section 3.4, the 



asymptotic performance of the algorithm is tracked through a few variables, called the state variables. The 
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state of the algorithm is the 5-tuple s = (m;6, p,a,G), where G corresponds to the distribution of the 
non-zero elements of the sparse vector s Q , a is the standard deviation of the measurement noise, and m 
is the asymptotic normalized mean square error. The threshold parameter (threshold policy) in its most 
general form could be a function of the state of the algorithm r(s). Define npi(m;<7, 5) = a 2 + ™. The 
MSE map is defined as 



(s, r(s)) = E\rj{X + y/npi(m,a, 5)Z 1 + i^npi(m,<j, 5)Z 2 ; r(s)) - X 



2 



where Z\,Zi ~ N(0, 1/2), and X ~ (1 — p5)5o(x) + p5G(x) are independent random variables. Note 
that G is a probability distribution on C. In the rest of this paper, we consider the thresholding policy 
t(s) = T-y/npi(m, a, 6), where the constant r is yet to be tuned according to the schemes introduced in 
Sections 1.4.1 and 1.4.2| When we use this thresholding policy we may equivalently write v I , (s,r(s)) as 



^(s, r). This thresholding policy is the same as the thresholding policy introduced in [3j|7|. When the 
parameters 5,p,a,r and G are clear from the context, we denote the MSE map by Vl/(m). The state 
evolution is the evolution of m by the rule 

m t+ i = V(mt). 



As will be described in Section 3.4, this equation tracks the normalized MSE of the CAMP algorithm in 
the asymptotic setting N — > oo. In other words, if mt is the mean square error of the CAMP algorithm at 
iteration t, mt+i will be the MSE of CAMP at iteration t + 1. 

Definition 3.1. Let ^ be almost everywhere differentiable. m* is called a fixed point of if and only if 



\P(m*) = m* . Furthermore, a fixed point is called stable if d ^}™^ < 1, and unstable if 

1. 



dm 



> 

m=m* 



It is clear that if m* is the unique stable fixed point of the function, then mt —> m* as t — > oo. Also, 
if all the fixed points of \& are unstable, then mt — > oo as t — > oo. Let p = \X\ and 9 = LX. Also, let 
G(p, 9) denote the probability density function of X and G(p) = J G(p, i 



Lemma 3.2. The MSE map does not depend on the phase distribution of the input signal, i.e., 

*(m, 5, p, a, G(p, 9),t) = *(m, 5, p, a, G(p),r). 
See Appendix [C] for the proof. 



3.2 Noise-free signal recovery 

Suppose that the measurements are noise-free, i.e., a = 0. Fix all the state variables except for m, and p. 
The evolution of m, discriminates the following two regions for p: 

Region I : The values of p for which ^(m) < m for every m > 0; 

Region II: The complement of Region I. 

Since is necessarily a fixed point of the ^ function, in Region I mt — > as t — > oo. The following 
lemma shows that in the second region m = is an unstable fixed point and therefore starting from mo ^ 0, 
mt 0. 

Lemma 3.3. Let a = 0. If p is in Region II, then has an unstable fixed point at zero. 
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Proof. We prove in Lemma D.l that \I>(m) is a concave function of m. Therefore, p is in Region II if and 

<P5!(m) 



only if 



dm 



m=0 



> 1. This in turn indicates that is an unstable fixed point. 



□ 



It is also easy to confirm that Region I is of the form [0, pse($, G, t)). As we will see in Section 3.4 



Pse($,G,t) determines the phase transition curve of the CAMP algorithm. According to Lemma 3.2 the 



MSE map does not depend on the phase distribution of the non-zero elements. However, it depends on 
G(/i). The following proposition shows that in fact pse is independent of G even though the VP function 
does depend on G(p). 

Proposition 3.4. pse($,G,t) is independent of the distribution G. 

Proof. According to Appendix [D| Vfr is concave. Therefore, it has a stable fixed point at zero if and only if 
its derivative at zero is less than 1. It is also straightforward (from Appendix |D[) to show that 



d$> 

dm 



PS(1 + T 2 ) | 1 



P 5 



E\ v (Z l +iZ 2 ;T) 



m=0 



Setting this derivative to 1, it is clear that the phase transition value of p is independent of G. 



□ 



According to Proposition 3.4 the only parameters that affect pse are S and the free parameter r. As 



mentioned in Section 1.4.1 one approach in tuning r is to set it such that the algorithm achieves its highest 
phase transition, i.e., 

Pse(S) = sup pse{5;t). 

T 

Using the state evolution we can calculate the optimal value of r and pse{&)- 
Theorem 3.5. pse($) and 5 satisfy the following implicit relations: 



Pse{5) 



duo 



• f . uj(uj — rVe 



4 (! + ^) L>r "fr ~ ^ ~ 4T L>r 

- 2r + 4 L>r^( T - W ) e "" 2 ^ 



dw 



for t £ [0, oo) 



See Appendix [E] for the proof. Figure [T] displays this phase transition curve. The implicit formulation 
above allows us to calculate the asymptotic performance of the phase transition as 5 — > 0. 



Theorem 3.6. pse($) follows the the asymptotic behavior 

1 



Pse(S) 



as 



0. 



See Appendix |F] for the proof. As mentioned before, this theorem shows that as 5 — > the phase transition 
of c-LASSO and CAMP is two time that of the r-LASSO. This is the benefit of grouping the real and 



imaginary parts in the noise-free case. We discuss optimal strategies for noisy cases in Section 3.3 



3.3 Noise sensitivity 

We first discuss the risk of the complex soft thresholding function. The properties of this risk play an 



important role in the discussion of the noise sensitivity of state evolution in Section 3.3.2 



11 



3.3.1 Risk of soft thresholding 

Define the risk of the soft thresholding function as 

r(n, r) = E\ V (pe j * + Z x + iZ 2 ; r) - X\ 2 , 

where the expected value is with respect to the two independent random variables Z\,Z 2 ~ N(0, 1/2). It 
is important to note that according to Lemma |3.2| the risk function is independent of (ft. The following 
lemma characterizes two important properties of this risk function: 

Lemma 3.7. r(ju,T) is an increasing function of p and a concave function in terms of p 2 . 

See Appendix [G] for the proof of this lemma. We define the minimax risk of the soft thresholding function 

as 

M b (e) = inf sup E\rj(X + Z 1 + iZ 2 ; r) - X\ 2 , 
r >° ueF t 

where v is the distribution of X and the expected value is with respect to X, Z\ and Z 2 . 
Proposition 3.8. The minimax risk of the soft thresholding function satisfies, 

poo 

M\e) = inf 2(1 - e) / w(w - r) 2 e~ w ' ' dw + e(l + r 2 ). 

T J W=T 

See Appendix [H] for the proof. We use this minimax risk to derive the noise sensitivity of the state 
evolution in the next section. 



3.3.2 Noise sensitivity of state evolution 



As mentioned in Section 3.1 in the presence of measurement noise the state evolution is given by 

mt+i = ^a(m t ), 



^a{rn t ) = E\r](X + y^npiZi + i-y/npi^; rympi) - X\ , 

where npi = a 2 + 

Lemma 3.9. ^> a {m) has a unique stable fixed point to which the sequence of {mt} converges. 



We call this fixed point fMSE(cr 2 , 5, p, G, r). According to Section 1.4.2, we define the minimax noise 
sensitivity as 

NS SE {5,p) =minsup sup fMSE(<r 2 , 5, p, G, t)/o 2 . 

T a>0 v<^T t 



The noise sensitivity of the state evolution can be easily evaluated from M (e). The following theorem 
characterizes this relation. 

Theorem 3.10. Let pmse(o~) be the value of p satisfying M^(p5) = 5. Then, for p < pmse we have 

NS SE (5,p) 



l-M b (<5p)/<5' 
and for p > pmse(8), NS SE (5,p) = oo. 

The proof of this theorem follows exactly the same lines as the proof of Proposition 3.1 in [7] , and therefore 
we skip it for the sake of brevity. The contour lines of this noise sensitivity function are displayed in Figure 

m 
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Lemma 3.11. For every 5 £ [0,1] we have, 

Pmse{5) = pse(S)- 

Proof. The proof is a simple comparison of the formulas. We first know that pmse is derived from the 
following equation 



mm2{l -p5) / u(u - rfe^dw + p8(l + r 

T Jlu>t 



1. This 



On the other hand, since ^>(m) is a concave function of m, Pse(8,t) is derived from d ^}^ 
derivative is equal to 



m=0 



d^(m) 



dm 



2(1 - pS) f 2 lu 2 , , P S n . 2n 

/ uj{oj — t) e doo + — (1 + r ). 

Ju»T 



m=0 Joj>t 



Also, pse($) = sup r pse( t , 8). However, in order to obtain the highest p we should minimize the above 
expression over r. Therefore, both pse{8) and pmse{$) satisfy the same equations and are exactly the 
same. □ 



3.4 Connection between the state evolution, CAMP, and c-LASSO 

There is a strong connection between the state evolution framework and the CAMP algorithm and c- 
LASSO. Recently, [10] proved that the state evolution predicts the asymptotic performance of the AMP 
algorithm when the measurement matrix is iid Gaussian. The result also holds for complex Gaussian 
matrices and complex input vectors, and the proof is essentially the same as the real case. As in j3], we 
conjecture that the SE predictions are correct for a "wide" class of random matrices. We show evidence 
of this claim in Section |4j Here, for the sake of completeness, we quote the result of 10 in the complex 
setting. Let 7 : C 2 — > II be a pseudo Lipschitz function^] and s Q and x l denote the original vector and 
the estimates of CAMP at time t respectively. Suppose that the empirical distribution of s Q converges to 
distribution F s . We then have 



8o,i) = E7 (v + ynp7z! + i\[^A t Z 2 ; r^A^j , X^j 



(6) 



almost surely, where Z\ + 1Z2 ~ CN(0, 1) and X ~ F s are independent complex random variables. It is 
also simple to extend the result of [7] and [45] on the connection of message passing algorithms and LASSO 
to the complex setting. For a given value of r suppose that the fixed point of the state evolution is denoted 
by m* . Define A(r) as 

X( T ) = TVm* (l-^E^dir] R (^X + Vm*Z 1 +iVm*Z 2 ; rVm*) + <W [x + v 7 ^*^ + i\fm?Z% rVm*) H , 

/ V ^ ' _ (7) 

and suppose that x(\(t)) is the solution of c-LASSO with the regularization parameter A(r). We then 

have 

lim lim — ||x t -x(A(a))||l = 

t— »oo N-^oo iV 

almost surely. For more information on the connections between these algorithms, see [7] and [45] 



5 7 : C 2 — > E, is pseudo-Lipschitz if and only if \ip(x) — ip(y)\ < L(l + 1 1 as 1 1 2 + ||y||2)||a: — y\\2- 
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Figure 4: Comparison of pse(3) with the empirical phase transition of c-LASSO [46] (left) and CAMP 
(right). There is a close match between the theoretical prediction and the empirical results from Monte 
Carlo simulations. 



4 Simulations 



As explained in Section |3.4[ our theoretical results show that if the elements of the matrix are iid Gaussian 
then state evolution predicts the performance of the CAMP and c-LASSO algorithms accurately. However, 
in this section we will show evidence that suggests the theoretical framework is applicable to a wider class 
of measurement matrices. We then investigate the dependence of the empirical phase transition on the 
input distribution for medium problem sizes. 



4.1 Measurement matrix simulations 

We investigate the effect of the measurement matrix distribution on the performance of CAMP and c- 
LASSO in two different cases. First, we consider the case where the measurements are noise-free. We 



postpone a discussion of measurement noise to Section 4.1.2 



4.1.1 Noise-free measurements 

Suppose that the measurements are noise-free. Our goal is to empirically measure the phase transition 
curves of the c-LASSO and CAMP on the measurement matrices provided in Table [T] To characterize the 
phase transition of an algorithm, we do the following: 

- We consider 33 equispaced values of 5 between and 1. 

- For each value of 5, we calculate pse(5) from the theoretical framework and then consider 41 equis- 
paced values of p in [pse(8) — 0.2, pse($) + 0.2]. 

- We fixJV = 1000, and for any value of p and 5, we calculate n = [SN\ and k = \_p5N\ . 

- We draw M = 20 independent random matrices from the given distribution and for each matrix we 
construct a random input vector s with a given distribution. We then form y = As Q and recover 
s from y, A by either c-BP or optimally tuned CAMP to obtain x. The matrix distributions and 
coefficient distributions we consider in our simulations are specified in Tables [T] and [2j respectively. 
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- For each S, p and Monte Carlo sample j we define a success variable Ss >Pl j = I( \\ x ^ < tol) and using 
these variable we calculate the success probability p$ = jj Ylj Ss,p,j ■ This provides an empirical 
estimate of the probability of correct recovery. The value of tol in our case is set to 10 -4 . 

- For a fixed value of 5, we fit a logistic regression function to p (5,p) to obtain Pg(p). Then we find 
the value of p$ for which p§ (p) = 0.5. 

See [6] for a more detailed discussion of this approach. For the c-LASSO algorithm we are reproducing the 
experiments of [46] and, therefore, we are using one-Ll algorithm [46| . Although the experiments show 
that in the regions of the phase diagram where the noise sensitivity is less than 2, even 100 iterations of 
CAMP are enough for the convergence, since our goal is to measure the phase transition, we consider 3000 
iterations. 

Figure [4] compares the phase transition of c-LASSO and CAMP on the ensembles specified in Table [T] 
with the theoretical prediction of this paper. In this simulation the coefficient ensemble is UP. Clearly, the 
empirical and theoretical phase transitions of the algorithms are very close. 



Table 1: Ensembles considered for the measurement matrix A in the matrix universality ensemble experi- 
ments. 



Name 


Specification 


Gaussian 


iid elements with standard normal real and imaginary parts 


Rademacher 


iid elements with real and imaginary parts distributed according to 

\8 ^(x) + \8 p-(x) 


Ternary 


iid elements real and imaginary parts distributed according 

V 2n V 2n 



4.1.2 Noisy measurements 

In this section we aim to show that, even in the presence of noise, the matrix ensembles defined in Table 
[T] perform similarly. Here is the setup for our experiment: 

- We set 5 = 0.25, p = 0.1, and N = 1000. 

- We choose 50 different values of a in the range [0.001, .1]. 

- We choose n x N measurement matrix A from one of the ensembles specified in Table [T] 

- We draw k iid elements from CAUP ensemble for the k = [pn\ non-zero elements of the input s . 

- We form the measurement vector y = As + crw where w is the noise vector with iid elements from 
CN(0,1). 

- For CAMP we set r = 2. For c-LASSO we use to derive the corresponding values of A for r = 2 
in CAMP. 

- We calculate the MSE \\x — s |||/^V for each matrix ensemble and compare the results. 
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"0 0.5 1 1.5 0.5 1 1.5 

Mean Square Error (Rademacher) x1 „^ Mean Square Error (Ternary) x10 -a 

Figure 5: Comparison of the means square error of c-LASSO for Gaussian and Rademacher matrix en- 
sembles (left), and Gaussian and Ternary ensemble (right). The concentration of points around the y = x 
confirms the universality hypothesis. The norms of residuals are equal to 5.9 x 1CP 4 and 6 x 1CP 4 for the 
left and right figures, respectively. Comparison of this figure with Figure [7] also confirms that as iV grows 
the points become more concentrated about y = x line. 



Figures [5] and [6] summarize our results. The concentration of the points along the y = x line indicates 
that the above matrix ensembles are performing similarly. In order to provide a stronger evidence, we run 
the above experiment with N = 4000. The result of this experiment is exhibited in Figures [7] and [8j It is 
clear from these figures that the MSE is now more concentrated around the y = x line. Results obtained 
for other values of the parameters confirm similar behaviors. 



4.2 Coefficient ensemble simulations 



According to Proposition 3.4, pse{$, t ) is independent of the distribution G of non-zero coefficients of sq. 
We test the accuracy of this result on medium problem sizes. We fix 5 to 0.1 and we calculate pg for 60 
equispaced values of p between 0.1 and 0.5. For each algorithm and each value of p we run 100 Monte 
Carlo trials and calculate the success rate for the Gaussian matrix and the coefficient ensembles specified 
in Table [2] Figure [9] summarizes our result. Simulations at other values of 5 result in very similar behavior. 
These results are consistent with Theorem \3A\ 



Table 2: Coefficient ensembles considered in coefficient ensemble experiments. 



Name 


Specification 


UP 


iid elements with amplitude 1 and uniform phase 


ZP 


iid elements with amplitude 1 and phase zero 


GA 


iid elements with standard normal real and imaginary parts 


UF 


iid elements with U[0, 1] real and imaginary parts 
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Figure 6: Comparison of the MSE of c-LASSO for Gaussian and Rademacher matrix ensembles (left), 
and Gaussian and Ternary ensemble (right). The concentration of points around the y = x confirms the 
universality hypothesis. The norms of residuals are equal to 9.1 x 10~ 4 and 9.4 x 10 -4 for the left and 
right figures, respectively. Comparison of this figure with Figure [8] also confirms that as N grows the points 
become more concentrated about y = x line. 




Figure 7: Comparison of the MSE of c-LASSO for Gaussian and Rademacher matrix ensembles (left), and 
Gaussian and Ternary ensemble (right). The norms of residuals are 2.8 x 10 -4 and 2.3 x 10~ 4 . Compare 
with Figure [5j 
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Figure 8: Comparison of the MSE of c-LASSO for Gaussian and Rademacher matrix ensembles (left), and 
Gaussian and Ternary ensemble (right). The norms of residuals are 2 x 10~ 4 and 1.8 x 10 -4 , respectively. 
Compare with Figure [6| 
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Figure 9: Comparison of the phase transition of c-LASSO (left) and CAMP (right) for different coefficient 
ensembles specified in Table [2j 5 = 0.1 in this figure. These figures are in agreement with the coefficient 
universality hypothesis. 
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5 Conclusions 



The problem of recovering a complex sparse signal from an undersampled set of complex measurements 
is considered in this paper. We accurately analyzed the asymptotic performance of c-LASSO and CAMP 
algorithms. Using the state evolution framework, we proved simple expressions for the noise sensitivity and 
phase transition of these two algorithms. The results presented here show that substantial improvements 
can be achieved when the real and imaginary parts are considered jointly in the algorithm. For instance 
as 5 — > we showed that the phase transition of CAMP and c-BP is two times higher than the phase 
transition of r-LASSO. 
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A Proximity operator 

For a given convex function / : C n — > R the proximity operator at point x is defined as 

Prox/(x) = arg min - \\y - x\\\ + f(y). (8) 
yev n Z 



The proximity operators play an important role in optimization theory. For further information refer to |47| 
or Chapter 7 of |40| . The following lemma characterizes the proximity operator for the complex £i-norm. 
This proximity operator has been used in several other papers [4| |15f[i~7||46| . 



Lemma A.l. Let f denote the complex t\-norm function, i.e., f(x) = Y^i y { x f') 2 + ( x l) 2 - Then the 
proximity operator is given by 

Pvox T f(x) = t](x;t), 



where r](u + iv; A) = yu + iv — ^fej=j=? ) ^{u 2 +v 2 >\ 2 } ^ s applied component-wise to the 



vector x. 



Proof. Since ^ can be decoupled into the elements of the vectors, we solve the optimization for a single 
component. In other words, we solve the optimiz ation in (|8~|) f or x,y € C. Suppose that the optimal y* 
satisfies (y^) 2 + {yi) 2 > 0. Then the function yj (y R ) 2 + {y 1 ) 2 is differentiable and the optimal solution 
satisfies 



x R - y? 



2 



x 1 -v 1 = Xy * (9) 

V(y?) 2 + (yl) 2 ' 

Combining the two equations in [9] we obtain y^x 1 = x R yl . Replacing this in ([9]) we have y R = x R — 
M x an( j yi = x i M x I — i s c l ear that if \/ (x R ) 2 + (x 1 ) 2 < A the signs of y R and x R 

will be opposite which is in contradiction with ([9]). Therefore, if 1/ (x R ) 2 + (x 1 ) 2 < A, both y R and y\ are 
zero. It can be easily checked that (0, 0) satisfies the subgradient optimality condition. □ 
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B Proof of Proposition |2.1 

Let 

7] R (x + iy) = TZ{rj(x + iy;X)), 
rf(x + iy) = X(rj(x + iy;X)), 

denote the real and imaginary parts of the complex soft thresholding function. Define 

R dr] R (x + iy) R dr] R (x + iy) 

o\V = o > 02V = a ' 

ox ay 

r difjx + iy) j drfjx + iy) 

o\V = ^ ' d2r l = a ' 

ox oy 

We first simplify the expression for 4-*%' 



'a ^ ^ AajXj ^ ^ A a j5xj_i a + AaiX^ . 
je[N] je[N] "TT"' 



Also, 



v ( Y, A u4 + J2 A *u 5z Ui - 4*4; n) + o(i/N) 

be[n] be[n] 

viYl A *bi z b + E A u 5 4^i\n) 



6e[n] 66 [n] 



66 [n] 66[n] 
66 [n] 66 [n] 

-niAi^d^iY^AtA + A yzU) 

be [n] be [n] 

-liA^zD^iY, A ti4 + E A *S*Li) + 0(1/N). (10) 

66 [n] 66 H 

Ylb A ti^ z b-^i = x \i smce <^%-+i = Afoxl, and the columns of the matrix are assumed to be normalized. It is 
also clear that 

5xUa = -nAl^d^iY^AlA+^AtM^) 

be[n] be[n] 

-i(A* ai zi)d 2 n R (J2 a&4+ £ AyzU) 

be [n] be [n] 

-^{Ki4)div I C£ J A u4+ E A u §z Ui) 

be[n] be[n] 

-liA^zD^iY, A ti4 + E A u 5z Ui)- ( n ) 

66 [n] 66 [n] 
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Also, 

z a = Va~ A a jXj — A a j5Xj_ ¥a 
j 

By plugging (11) into ( fl2] ), we obtain 



(12) 



j 

j 



which completes the proof. 



C Proof of Lemma 3.2 



For notational simplicity, consider the case m = 1. The proof for the other values of m is exactly the same. 



^^(X + Z! +iZ 2 ;r) -X| 2 
o 

^E|r ? (Z 1 + iZ 2 ; r)| 2 + |e ^ e \7 1 ( f ie j9 Z 1 + iZ 2 ; r) - /xe^| 2 



(13) 



where E„ g denotes the conditional expectation given the variables /j,, 9. The first term in ( 13 ) is independent 



of the phase and therefore we should prove that the second term is independent as well. Define 

$0u, 0) 4 E^flTjO^'* + Zi + iZ 2 ; r) - /xe^| 2 ), (14) 
We prove that $ is independent of 9. Define z = (z r , z c ), dz = dz r dz c . We will use the following notation: 

a z = \/ (jj, cos + z r ) 2 + (// sin 6* + z c ) 2 , 

A /i cos 9 

c r — , 

A fj, sin 6* 



Define the two sets A T = {(z r , z c ) \ a z < r} and ^4!j: = R 2 \A T . We have 



*(M,e)= y /i 2 ^e- ( ^ 2+ ^ 2) dz+ y iz r + 



^z c — rc r — ztq — e v r 

7T 



(15) 



zGA T 



Define /3 = \x cos + z r and 7 = sin # + z c . 
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We obtain 



/\z r + jz c — Tc r — jTcA 2 —e ( z r +z c) dz r dz c 
7T 



/ 



yj X 2 +y 2 >T 
2tt 



(3 — [mcosO + 1(7 — /isinfl) — i- 



— e 

7T 



-/iCOsS) 2 — (7— ^sin0)' 



= J J |(r-T)cos0-^cos0 + i((r-r)sm0-^sin^)| 2 ^e-( rcos ^ cose ) 2 -( rsin ^ sine )Vdr# 

</>=0r>r 

= j j [(r-r) 2 +^ 2 -2/i(r-r)cos(^- ( /))]e- r2 -^ +2 ^ cos( ^ ) r ( ir#. 

Periodicity of the cosine function proves that this term is also independent of the phase 9, and the proof 
is complete. 

D Concavity of \I/ function 

Lemma D.l. The function *I>(m) is concave with respect to m. 

Proof. For the notational simplicity define a = y/rn, X a = f , and A a = \X a -Z 1 + iZ 2 \. We have, 

*(cj 2 ) = a 2 E{\r ] (X a + Z 1 + iZ 2 ;T)-X f7 \ 2 ^ = 

= a 2 E(K x (\r ] (X a + Z 1 + iZ 2 ;T)-X a \ 2 ^y 

We first prove that ^x{o~ 2 ) = Ex (\rj(X a + Z\ + iZ 2 \ r) — X a \^ is concave with respect to a 2 : 

d ^ 2 g2) = E x |r ? (X (7 + Z 1 +iZ 2 ;r)-X (J | 2 + |^E x |r ? (X (7 + Zi+iZ 2 ;r)-X (7 | 2 

= Ex^pG + Zi+iZ^r)-^ 2 

-X CT E X (r/f + Zi + iZ 2 ; r) - l) {n R {X a + Zi + iZ 2 ; t) - X a ) . 

It is therefore easy to see that 

= J.F..Y {vi(X • Z ; • iZ 2 ;r) 1) (,/'•';. V,, • Z, • iZ 2 :r) X c ) 



d 2 cr 2 



- -^E x (r?((X CT + Zi + iZ 2 ; r)) fapk + Zi + iZ 2 ; r)) 
u 

X 2 / r> N n2 

X 2 



+ ^E x (rtf (X a + Zx + iZ 2 ;r) - l)' 
+ ^E x (rtfpk + Zi + iZ 2 ;r)) 2 



(16) 
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We have 



r ] R (X a + Z 1 +Z 2 ;r) 
r l I 1 (X a + Z 1 +iZ 2 ;T) 
r j I (X a + Zi + iZ 2 ;r) 



[I 



tZ$ 
A 3 



)I(A > t) 



(X a + Z 1 - T -^±M )I{A >r) 



r{X a + Z 1 )Z 2 
A 3 

Z 2 - T -^I(A > r) 



A 

I{A > T, 



Plugging (17) into (16) we have 



Similarly, 



E x (r£ \X a + Z X + iZ 2 ; r) - l) ( V R (X a + Z x + iZ 2 ;r) - X a ) 

X ( tXZ 2 \ 
+ E X (4(X <r + Z 1 + iZ 2 ;r)) [r] 1 (X a + Zt + iZ 2 ;r)) = -E x \I(A<t) + — . 

E x (r] R (* + Z!+ iZ 2 ; r)-lj + E x (r&{* + Z x + iZ 2 - t)\ 



E X (l(A <r)+ T ^1(A >r) + ^ > r) 



Combining (18) and (19) we obtain 

d 2 a 2 



tZ 2 t 2 ZI r 2 (X/a + Z 1 ) 2 Z 2 



+ 



+ 



A e 



A 3 A 6 
E x (-^ + lA^)I(A>r)<0. 



)I(A > t) 



A® 



(17) 



(18) 



(19) 



(20) 



Finally, we use the fact that a convex combination of concave functions is concave to extend the concavity 
from \I> x( m ) to \I/(m), and the proof is complete. □ 



E Proof of Theorem 3.5 



As proved in Lemma D.l Vl/(<7 ) is a concave function. Furthermore ^(0) = 0. Therefore is a stable fixed 



point if and only if \ m _ < 1- It is straightforward to calculate the derivative at zero and confirm that 



d^ 
dm 



p6(l + t 2 ) 1 - pS 



+ 



-E\r)(Z x +iZ 2 -T)\ 2 . 



(21) 



m=0 



Since Z\,Z 2 ~ iV(0,l/2) and are independent, the phase is uniform and the amplitude has Rayleigh 
distribution. Therefore, we have 



E|»/(Zi + iZ 2 ;r) 



uj{lo — r) e u 



(22) 
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Plugging (22) in (21) and setting the derivative to 1 we obtain 

5 — 2 J r °° w(u) — r)"e 



P 



5(1 + r 2 - 2 J t °° u{u - T) 2 e~" 2 du) ' 
Therefore the optimal r that achieves the highest phase transition satisfies 



u)e~ u " 'duj ^l + r 2 -2 J u(u - T^) 2 e~^du) 

oo \ / rco 

2t* - 4 / u(n - u)e~^du) (5-2 / w(w - r*) 2 e" w2 



which in turn results in <5 



p we obtain the formula in the theorem. 



-2X+i U){r t -uj)c-^ du) 



. Plugging 5 into the formula for 



F Proof of Theorem EL6 



The proof is an application of Laplace's method. It is clear from Theorem 3.5 that as A — > oo, p — > and 
5 — > 0. Therefore, we should calculate the leading terms of p and 5. Using the Laplace's method, we can 
prove that 



_ 2 e 
u(X — cj)e~ w du ~ „ 

A oX 6 



oo, 



(23) 



/•oo 

/ w(A — ui) 



2 e ^du ~ ^7T7^, A ->■ oo. 



4A 2 



(24) 



Plugging (23) and (24) into the formulas we have for p and 5 in Theorem 3.5 we obtain 

e" A2 

5 — , A -)• oo, 

P ~ ^2, A -4 oo 

which completes the proof. 



(25) 



G Proof of Lemma 13.71 



Since we have proved that the phase of the input signal does not affect the state evolution equation, we 
assume that the phase is equal to zero. Hence 

r(ji, t) = E \r,(p + Z x + iZ 2 \ r) - p\ 2 = E(r/ H (^ + Z x + iZ 2 ; r) - p) 2 + E(r/ J (/i + Zi + iZ 2 ; r)) 2 , 

where r/ fi ( M + Z x + iZ 2 ; r) = ( M + Zi - T -^±^-)\(A > r), rf(fi + Z x + iZ 2 ; r) = (z 2 - > r) and 

A = y/(p, + Zi) 2 + Z\. If we calculate the derivative of the risk function with respect to p, we have 
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= 2Efa*(M + Zi + ^2; r) - /i)(^ - 1) + Vfo + Zx + *Z 2 ; r)^J 
= E( V R (» + Zi + iZ 2 ; r) - M )((l - ^)I(A > r) - 1) + ^(/z + Zx + iZ 2 ; t)( t(m + f l} ^ 2 > 



A 3 



/iE(I(A < r)) - E(Z X - v " , X/ )(^f) + (Z 2 - — ±) 



tZ 2 r(fi + Z\)Z 2 



fiE(I(A < r)) 



A ,y A 3 J ' A 1 A 3 

tZ x Z\ t 2 ^Z\ T 2 Z x Zl rnZl rZ\Z x T 2 [lZl T 2 Z X Z 2 



(jiEQL(A < r))) + fiE 



A 3 + A 4 
tZV 



+ 



A* + A 3 + A 3 A* 



A 4 



A 3 



> 0. 



Therefore, the risk of the complex soft thresholding is an increasing function of [i. Furthermore, 

2 dr( /V ) = l dr{ ^ T) = E{I{A <r)) + E ( TZl 



dfi 2 fj, dfj, \ A 3 

It is clear that the next derivative with respect to fi 2 is negative, and therefore the function is concave. 



H Proof of Proposition 3.8 



Let = (1 — e)So(^) + (1 — e)G(fi). We then have 
E\r](X + Z x + iZ 2 ; t) - X\ 2 = (1 - e)E\ V {Z 1 + iZ 2 ; r)| 2 + eE x ^ G E x \ V (X + Z x + iZ 2 - r) - X| 



/>oo 

2(1 - e) / w(u> - rfe^dw + eE x ^ G E x \r)(X + Zi + iZ 2 ; r) - X\ 

J W=T 



(26) 



Using Lemma 3.7 and Jensen inequality we prove that {G m (^)}^ =1 , G m (fi) = S m (fi) is the least favorable 
sequence of distributions, i.e., for any distribution G 

E x ^ G E x \ V (X + Z x +iZ 2 ;T)-X\ 2 < lim E x „ Gm E x \ V (X + Z x + iZ 2 - r) - X\ 2 . 

Toward this goal we define G((jl) as 5 Mo (/x), such that = E G (X 2 ). In other words, G and G have the 
same second moments. From the Jensen inequality we have 

E x ^ G E x \r](X + Z x + iZ 2 - t) - X\ 2 < E x ^E x \rj{X + Z x + iZ 2 - r) - X| 2 . 

Furthermore, from the monotonicity of the risk function proved in Lemma |3.7[ we have 

E X ^E X \ V {X + Z X + %Z 2 - r) - X\ 2 < E x ^ Gm E x \r](X + Z X + iZ 2 - r) - X\ 2 Vm > Mo . 

Finally, the monotone convergence theorem indicates that 

lim E x ^ Gm E x \ V (X + Z X + iZ 2 ; r) - X\ 2 = 1 + r 2 . (27) 



Combining (26) and (27) completes the proof. 
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